Fairness preserving byzantine agreements

ABSTRACT

A technique is disclosed for building agreement among a plurality of servers who receive a transaction from clients. The technique includes each server broadcasting its received transaction to all other servers. Each server uses the set of transactions that it received from all servers (including its own transaction) to produce an echo that represents the set of transactions, and broadcasts the echo. Each will commit its transaction to a log if its echo matches each echo received from the other servers. The present disclosure can detect byzantine failures and punishes deviating participating servers by reconfiguring the plurality of servers that participate in the protocol.

CROSS REFERENCE TO RELATED APPLICATIONS

Pursuant to 35 U.S.C. § 119(e), this application is entitled to and claims the benefit of the filing date of U.S. Provisional App. No. 62/488,536 filed Apr. 21, 2017, the content of which is incorporated herein by reference in its entirety for all purposes.

BACKGROUND

Digital technology provides support for conducting various transactions among parties (institutions, individuals, groups, etc.) such as secure record keeping (e.g., land title registries, wills, etc.), online voting, storing and managing binding contracts, secure online storage across multiple servers, and so on. Perhaps the most prominent technology considered in this context is the blockchain, which implements a secure peer-to-peer distributed (shared) ledger on top of a consensus engine to facilitate agreement on the history of transactions among parties.

A major effort in this direction is Hyperledger, an open source project hosted by the Linux Foundation and backed by a consortium of more than a hundred companies. Blockchain protocols for institutions (sometimes referred to as permissioned blockchains) take a conservative with its participants: Every participant is known and certified, so that it has to be responsible for its actions in the real world. Systems are intended to be deployed over a secure and reliable network. Therefore, proposed solutions for permissioned blockchains abandon the slow and energy-consuming proof-of-work paradigm (such as used in Bitcoin) to solve decentralized anonymous consensus, and tend to go back to more traditional distributed consensus protocols. Because of the high stakes involved, malicious deviations from the protocol, albeit expected to be rare, should never compromise safety. Such deviations are modeled as byzantine behavior, and to deal with such behavior, proposed solutions use byzantine fault tolerant (BFT) protocols for the consensus engine.

A motivation for participating in a distributed ledger system is that participating entities can receive a fee for every transaction they append to the ledger, and thus would want to maximize the rate of their transactions in the ledger. A fair distributed ledger that is resilient to selfish behavior is one that fairly (e.g., equally) divides the ledger among the participants when all the participants follow the protocol, and guarantee that a strategy that adheres to the protocol is the best strategy for every participant, for example, in terms of the number of transactions in the ledger.

BRIEF DESCRIPTION OF THE DRAWINGS

With respect to the discussion to follow and in particular to the drawings, it is stressed that the particulars shown represent examples for purposes of illustrative discussion, and are presented in the cause of providing a description of principles and conceptual aspects of the present disclosure. In this regard, no attempt is made to show implementation details beyond what is needed for a fundamental understanding of the present disclosure. The discussion to follow, in conjunction with the drawings, makes apparent to those of skill in the art how embodiments in accordance with the present disclosure may be practiced. Similar or same reference numbers may be used to identify or otherwise refer to similar or same elements in the various drawings and supporting descriptions. In the accompanying drawings:

FIG. 1A illustrate a distributed ledger system in accordance with some embodiments of the present disclosure.

FIG. 1B illustrate a distributed ledger system in accordance with other embodiments of the present disclosure.

FIG. 2 illustrate a computer in accordance with the present disclosure.

FIG. 3 shows a committee in accordance with the present disclosure.

FIGS. 4A, 4B, 4C illustrate different configurations of all-to-all communication.

FIG. 5 illustrates processing (sequencing) of a distributed ledger protocol in accordance with the present disclosure.

FIG. 6 illustrates communication among players during processing (sequencing) of a distributed ledger protocol in accordance with the present disclosure.

FIG. 7 is pseudocode to further illustrate the processing described in FIG. 5 with regard to committee members.

FIG. 8 is pseudocode to further illustrate the processing described in FIG. 5 with regard to the master service.

DETAILED DESCRIPTION

In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. Particular embodiments as expressed in the claims may include some or all of the features in these examples, alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein. It is understood that persons of ordinary skill in the relevant arts will readily recognize the concepts and techniques disclosed herein are applicable in any distributed system using a shared ledger.

The present disclosure addresses a number of important considerations that were not emphasized in conventional BFT protocols: (1) fairness among participants, and (2) optimized for failure-free performance, while still offering a correct service in the presence of byzantine faults. In addition, it is important to stress protocol simplicity because complex protocols are inherently bug-prone and easier to attack. The present disclosure presents a system for a new permissioned blockchain protocol, which addresses the above issues. The system is fair, simple to understand, easy to implement, and works correctly with up to a minority of byzantine failures.

The present disclosure describes a distributed ledger protocol that can run normal mode and alert mode. Assuming that byzantine failures are expected to be rare, systems in accordance with the present disclosure can optimize for the case that byzantine failures do not occur, this case is called “normal mode.” In normal mode, the distributed ledger protocol can provide high performance when all players are correct (i.e., are not byzantine). Even with byzantine failures, the protocol is always correct and fair but may not guarantee progress due to the malicious actions of byzantine participants. If the system observes that byzantine participants are attempting to prevent progress (e.g., as a result of a computer hack or other break-in), the system can switch to “alert mode.” At this point, it is expected that real-world authorities, such as the FBI, Interpol, or other authority, will step in and investigate the break-in. But such an investigation takes time (days, weeks) to complete. Meanwhile, the system can remain operational by running distributed ledger protocol in alert mode, albeit at the cost of degraded performance.

In accordance with aspects of the present disclosure, the system can designate a committee of players (e.g., participant banks) that runs the distributed ledger protocol to order all transactions. The system can include a master service to monitor the committee's progress (i.e., that each player in the committee is progressing through the protocol) and initiate reconfiguration of the committee in response to detecting non-progress among committee members. While executing in normal mode, byzantine players cannot violate safety but can prevent progress in the system, which can be detected by the master service. In response to detecting lack of progress (e.g., a player times out while executing in normal mode), the master service can switch the system to alert mode. A feature of executing in alert mode is that if players deviate from the protocol in a way that jeopardizes progress, they can be accurately detected and removed from the committee. A system in accordance with the present disclosure does not indict correct players. Indicting faulty components without accusing correct ones is a feature of the present disclosure that allows the system to heal itself following an attack.

FIG. 1A depicts an illustrative example of a shared ledger system 100 in accordance with aspects of the present disclosure. The system 100 may include a distribution of servers (players) 102 connected to a communication network 14. Client machines 12 may access the system 100, for example, by communicating with the servers 102 over the communication network 14. The servers 102 may be geographically spread across multiple sites, countries, institutions, and so on. The communication network 14 may comprise any combination of local area networks, public switched network, wide area network, and the like.

A master service 106 can monitor the progress of the servers 102. Generally, the master service 106 can select a set of servers 102 to define a committee (300, FIG. 3) to process transactions in accordance with the present disclosure. The master service 106 can reconfigure the committee to contain a different set of servers 102 in response to detecting that one or more members in the committee are not responding. This aspect of the present disclosure is discussed below.

As with shared ledger systems in general, each server 102 in system 100 can concurrently access a shared (common) ledger, and jointly attempt to agree on the order of transactions appended to the shared ledger. Each server 102 can be viewed as having an unbounded stream of transactions (Tx) that it wants to append to the shared ledger. The stream, for example, can come from the clients 12. We can assume that each server 102 receives a fee or some other benefit from every transaction it appends to the shared ledger, and so can be motivated to append as many transactions as possible. As noted above, a feature of the present disclosure is to ensure “fairness” among the servers 102, in other words, to provide each server 102 with equal opportunity for appending transactions to the shared ledger.

The shared ledger can be instantiated in any of several ways. FIG. 1A, for example, shows that in some embodiments the shared ledger can exist as replicated local copies 112 a, 112 b, 112 c, respectively stored in servers 102. Each server 102 may manage its respective local copy 112 a, 112 b, 112 c of the shared ledger. Processing in accordance with the present disclosure ensures correctness (i.e., contains the same sequence of transactions) among the local copies 112 a-112 c of the shared ledger. Merely to illustrate another example of a shared ledger implementation, FIG. 1B illustrates a shared ledger instantiated as a globally managed ledger. Servers 102 may append their transactions to a global shared ledger 112 by sending requests to a global manager 108.

The servers 102, master service 106, and clients 12 shown in FIG. 1A can comprise any suitable computer system. FIG. 2, for example, shows an illustrative implementation of a computer system 202 having a processing unit 212, a system memory 214, and a system bus 211. The system bus 211 may connect various system components including, but not limited to, the processing unit 212, the system memory 214, an internal data storage device 216, and a communication interface 213. In a configuration where a client machine 12 is a mobile communication device (e.g., smart phone, computer tablet, etc.), the internal data storage 216 may or may not be included.

The processing unit 212 may comprise a single-processor configuration, or may be a multi-processor architecture. The system memory 214 may include read-only memory (ROM) and random access memory (RAM). The internal data storage device 216 may be an internal hard disk drive (HDD), a magnetic floppy disk drive (FDD, e.g., to read from or write to a removable diskette), an optical disk drive (e.g., for reading a CD-ROM disk, or to read from or write to other high capacity optical media such as the DVD, and so on). In a configuration where the computer system 202 is a mobile device, the internal data storage 216 may be a flash drive.

The internal data storage device 216 and its associated non-transitory computer-readable storage media provide nonvolatile storage of data, data structures, computer-executable instructions, and so forth. Although the description of computer-readable media above refers to a HDD, a removable magnetic diskette, and a removable optical media such as a CD or DVD, it is noted that other types of media which are readable by a computer, such as zip drives, magnetic cassettes, flash memory cards, cartridges, and the like, may also be used, and further, that any such media may contain computer-executable instructions for performing the methods disclosed herein.

The system memory 214 and/or the internal data storage device 216 may store various program and data modules 218, including for example, operating system 232, one or more application programs 234, program data 236, and other program/system modules 238. For example, in a computer system 202 configured as a server 102, execution of the application programs 234 may cause the computer system 202 to perform method steps in accordance with the present disclosure.

An external data storage device 242 may be connected to the computer system 202. For example, in the shared ledger system 100 of FIG. 1A, the external data storage device 242 of each server 102 may provide storage for its respective local copy 112 a, 112 b, 112 c of the shared ledger.

Access to the computer system 202 may be provided by a suitable input device 244 (e.g., keyboard, mouse, touch pad, etc.) and a suitable output device 246, (e.g., display screen). In a configuration where the computer system 202 is a mobile device, input and output may be provided by a touch sensitive display.

The computer system 202 may operate in a networked environment using logical connections via wired and/or wireless communications to one or more remote computers (not shown) over a communication network 252. The communication network 252 may be a local area network (LAN) and/or larger networks, such as a wide area network (WAN).

A Note about Players

Shared ledger system 100 uses the distributed ledger protocol of the present disclosure to control access to the shared ledger. When a server 102 deviates from the protocol, the deviation can be due to a software or hardware bug, or by the server being hacked. A server 102 can suffer a byzantine failure and continue to function with arbitrary capricious behavior toward the other entities. Generally, byzantine failures include, among other faults, crash faults, message omissions, inconsistent messages, malicious intrusion, and in general any bugs or defects that may cause deviation from a prescribed behavior. The protocol should remain correct even if some servers 102 deviate from the protocol in an arbitrary manner. Such protocols are referred to as byzantine fault tolerant, and only a small subset of the servers 102 can be byzantine faulty. In accordance with the present disclosure, the protocol also takes into account that every server 102 may deviate from the protocol if doing so increases its benefit.

In accordance with the present disclosure, a committee is created to sequence through the protocol. A committee comprises a set of n servers 102 (referred to herein as “players”) that participate in sequencing of the protocol. A player is either a “correct player” or a “byzantine player.” A byzantine player can deviate arbitrarily from the protocol or otherwise exhibit any kind of faulty behavior. The number f of byzantine players is bounded by a known parameter f<n, and we assume that byzantine players can collude, but correct players do not.

Shared Ledger

A shared ledger (e.g., 112 a, FIG. 1A or 112, FIG. 1B) comprises a log 114, which can be data structure that stores a sequence of transactions from some domain that is defined by the high-level applications. In a banking context, for instance, the domain of transactions may include withdrawals and deposits between a bank and its clients, transfer of funds between banks, and so on. The shared ledger supports two operations, append( ) and read( ) with the following sequential specification: An append(Tx) operation changes the state of the ledger by appending operation Tx to the end of the ledger. A read( ) operation returns the ledger's state. Unless noted otherwise, “ledger” and the “log” that comprises the ledger may be used interchangeably.

In some embodiments, the “utility function” of a player can be defined as the ratio of transactions that it appends to the ledger, e.g., the number transactions appended to the ledger by the player expressed as a percentage of the total number of transactions in the ledger. Intuitively, “fairness” means that every player gets an equal number of opportunities to append a transaction to the ledger. Thus, if player p₁ follows the protocol, then at any point when the ledger contains k transactions appended by p₁, the ledger does not contain more than k+1 transactions appended by any other player. This idea of ratio is more formally defined below, and extended to a case in which different players can be allocated different shares of the ledger and to deal with reconfiguring the committee of players in response to detection of lack of progress (e.g., when a player is not progressing through the protocol).

In accordance with the present disclosure, the distributed ledger protocol emulates an abstract ledger to a set of players that access it concurrently. A “linearization” of a concurrent execution r is a sequential execution that satisfies the shared ledger's sequential specification, which (1) includes all completed operations by players that follow the protocol, and (2) may or may not include pending (not completed) operations as well as operations performed by clients that deviate from the protocol. Intuitively, this means that the protocol tolerates players that do not cooperate by restricting the possible outcomes of their behavior to executing correct operations or leaving the ledger's state unchanged.

We consider two consistency (safety) properties: linearizability and sequential consistency. A fair distributed ledger protocol A is linearizable if every (concurrent) execution r of A has a linearization that preserves r's real-time order. A fair distributed ledger protocol A is sequentially consistent if every run of A has a linearization that preserves the real-time order of every player in r.

System Model

Some global aspects of the shared ledger system 100 in accordance with the present disclosure will now be discussed.

1. Certificates

In accordance with some embodiments, we assume that the players have been certified by some trusted certificate authority (CA) that is known to all other players. In addition, we assume the shared ledger system 100 provides a public key infrastructure (PKI): each player has a unique pair of public and private cryptographic keys, where the public keys are known to all players, and no coalition of players has enough computational power to unravel other players' private keys.

2. Timing

In accordance with some embodiments, we assume synchrony, which means that there is a known upper bound Δ on the time it takes for messages to arrive. This bound can be used to detect failures when the protocol is stuck because a byzantine player deviates from the protocol by not sending messages. For example, a player may report (e.g., to the master service 106) that it had not received a message in Δ time, or that a player had not sent a message in Δ time.

3. Byzantine Behavior

We assume that byzantine players are controlled by a strong adversary, which means that byzantine players can arbitrarily deviate from the protocol (e.g., crash, send incorrect messages, not send messages, etc.), lie about their state, and so on, and in particular, they can collude to try to violate the protocol properties. A byzantine player can be generally characterized as one who acts in a faulty manner.

4. Quality of Service

Above, we gave a simplistic definition of fairness that was based on all players being allowed to append transactions to the ledger at the same rate. However, this does not necessarily have to be the case in some embodiments. For example, slow players sometimes cannot sustain the throughput of faster players, insisting on strict fairness can cause a decrease in total system throughput. In some instances, it is possible that some players deserve more throughput, e.g., because they pay more for the service or contribute more to the protocol. Therefore, we can generalize our fairness definition to allow general quality of service (QoS) allocations. Because QoS allocations can change over time, we define QoS-fairness for segments of the ledger. Denote by TX[i,j] a segment of the ledger from the i^(th) to the j^(th) entry (inclusive). QoS for the ledger segment TX[i,j] can be defined as follows:

-   -   Given a vector R=         r₁, r₂, . . . r_(n)         such that ∀_(i)0≤r_(i)≤1 and

${{\sum\limits_{i = 1}^{n}\; r_{i}} = 1},$

we say that the segment TX[i,j] of a (sequential) ledger is R-fair if for every player p_(i) that follows the protocol, the number of transactions in TX[i,j] that were appended by p_(i) is at least └|TX[i,j]|r_(i)┘. Note that the ledger fairness definition introduced above coincides with the R-fair ledger definition given here, when for every r_(i)∈R, r_(i)=1/n.

System Architecture

FIG. 3 logically depicts the shared ledger system 100 with respect to two components: (1) a committee 300 of n players (e.g., servers 102) that runs the sequencing protocol to append transactions to the ledger, and (2) the master service 106, which is responsible for monitoring protocol progress and determining QoS allocations. In accordance with the present disclosure, there is a single (known to all) quorum (the committee 300), which partakes in agreeing on every operation. In some embodiments, the number of servers in the committee may be 2f+1, where f is the allowed maximum number of byzantine faulty servers in the committee. In some embodiments, the master service 106 may select the committee 300. The committee 300, for example, can be chosen based on the geographic location of the players, the load they can handle, and the like. To handle the case when committee members stop responding (e.g., due to a crash), the master service 106 is responsible for reconfiguration of the committee 300: detecting such players and replacing them.

The committee 300 runs the sequencing of the protocol using all-to-all communication, discussed below. The protocol is safe even if n−1 players are byzantine, but makes progress only if all the players participate and follow the protocol. This means that while any coalition consisting of n−1 byzantine players cannot violate safety, it takes only one byzantine player to stop the committee's progress. The role of the master service 106 is to monitor sequencing of the protocol and detect players that prevent progress. When lack of progress is observed (e.g., a player is not making progress in the protocol sequence because they are have not sent or received a message within time period Δ), the master service 106 can run a recovery protocol to reconfigure the system. For instance, the master service 106 can remove byzantine players, add new players in place of byzantine player, and punish selfish players by reducing their ratio in the ledger (e.g., by changing that player's QoS allocation).

The master service 106 can be implemented in any of a number of suitable ways. In some embodiments, the master service 106 can be a single trusted authority. In other embodiments, the master service 106 can be emulated on top of a set of fault-prone players (e.g., using a suitable fault tolerant protocol), and so on.

1. QoS Adjustment

In addition to forming the committee 300, the master service 106 also determines the QoS allocations that should be enforced on the players comprising the committee 300. The initial value of the QoS allocations (e.g., R_(original)=

r₁, r₂, . . . r_(n)

) can be chosen, for example, based on real world contracts among institutions, or by their available throughput or payment. When the master service 106 reconfigures the committee 300, it provides a new vector R_(new)=

r₁, r₂, . . . r_(n)

that represents the ratio that each player in the reconfigured committee should get in the ledger, where a player p_(i) is on the committee 300 if and only if r_(i)>0. The portion of the ledger decided by the new committee satisfies the QoS-fairness requirement with respect to R_(new). The master service's authority to modify QoS allocations can ensure that all players follow the protocol: Whenever the master service 106 detects a player that deviates from the protocol, it can immediately reduce the ratio of transactions allocated to the player. Thus, a player whose utility function is the ratio of transactions it appends to the ledger should prefer to collaborate rather than deviate in fear of having its ratio reduced.

2. Detectable Byzantine Broadcast

The ability of the master service 106 to use the punishment mechanism as well as to evict byzantine players relies on its ability to detect deviation from the protocol. In accordance with the present disclosure, the possible deviations can be divided into two categories: active and passive. An active deviation occurs when a player tries to break consistency or fairness by sending messages that do not coincide with the protocol. By signing all messages with private keys we achieve non-repudiation, i.e., messages can be linked to their senders and provide evidence of misbehavior. The master service 106 can inspect the signed messages to detect deviation from the protocol.

Passive deviation occurs when a player stalls the protocol by withholding messages (lack of progress). Even a single player can stop the protocol's progress by simply not sending messages. If the protocol hangs waiting for player p₁ to take an action following a message it expects from player p₂, we cannot know if p₂ is the culprit (because p₂ never sent a message to p₁ and so p₁ does not act within time period Δ) or if p₁ in fact is at fault. The master service 106, in such a situation, cannot decide which player(s) is deviating from the protocol and ought to be removed and replaced. In accordance with some embodiments of the present disclosure, communication among players can use an all-to-all communication paradigm to identify passive deviation by a faulty player(s).

In accordance with the present disclosure, the all-to-all communication paradigm can support a broadcast(m) operation (i.e., send to all) and a deliver(m) operation among the players, and a detect( ) operation for the master service 106. The broadcast( ) and deliver( ) operations express the idea that when a player p_(i) broadcasts a message to all the other players, that message is guaranteed to be delivered to all the other players. The detect( ) operation, performed by the master service 106, returns a set S of players that passively deviate from the protocol, for every two players (p_(b), p_(d)) such that p_(d) does not deliver a message from p_(b), S contains p_(b) when p_(b) did not perform the broadcast(m) operation properly, and otherwise, it contains p_(d). If S is empty, all the players follow the protocol, meaning that all the players broadcast a message and deliver messages broadcast by all other players.

FIG. 4A illustrates one configuration of all-to-all communication, referred to as direct all-to-all, in which the broadcast(m) operation sends message m to all other players. As can be seen in FIG. 4A, each player p₁, p₂, p₃, p₄, p₅ broadcasts message m at time t1 to the other players.

Another configuration of all-to-all communication includes designating a subset of the players to act as “relays.” The use of relays is referred to as relayed all-to-all communication, and can be either active or passive. FIG. 4B illustrates an example of a configuration for active relayed all-to-all communication, in which players p₂, p₃, p₄ are designated as relays. In the active version, each player sends a message at time t1 to only the relays p₂, p₃, p₄. A time t2, when a relay p₂, p₃, p₄ receives a message for the first time, it sends it to all the players p₁, p₂, p₃, p₄, p₅.

FIG. 4C illustrates an example of a configuration for passive relayed all-to-all communication, in which the relays p₂, p₃, p₄ do not automatically send messages to all players. Instead, the players use direct all-to-all communication, and only when a player does not receive a message from some other player, that player makes a request for the message from the relays. FIG. 4C, for example, shows at time t1 that all players send to all other players (direct all-to-all). Suppose at time t2 one or more players times out waiting for message m, those players can then request message m from relays p₂, p₃, p₄ (which had already been delivered to them at time t1).

It is noted that even though direct all-to-all exhibits lower number of network hops, it is not always feasible. For example, if the system 100 is deployed on top of private physical links, such links might not necessarily exist among all pairs of players. It is further noted that active relayed all-to-all communication does not necessarily have worse latency than direct all-to-all, since the latter depends on the slowest link, while in relayed all-to-all communication players with the fastest links can be designated the relays.

The discussion will now turn to a description of sequencing through the distributed ledger protocol of the present disclosure. The protocol works in epochs. In each epoch, every participating player gets an opportunity to append one transaction or one fixed-size batch of transactions to the shared ledger (e.g., 112 a, FIG. 1A), via the append( ) operation. The key mechanism to ensure fairness is to commit all the transactions to the ledger atomically (all or nothing). Since we assume that players have infinite streams of transactions, they will always have enough transactions to append, otherwise, they can always append an empty (dummy) transaction.

In accordance with the present disclosure, the append( ) operation comprises three communication rounds between players in the committee 300. Referring to FIG. 5 and the accompanying system diagram of FIG. 6, the discussion will now turn to a high level description of sequencing through the distributed ledger protocol of the present disclosure by each player (e.g., servers 102, FIG. 1A) to execute the append( ) operation. In some embodiments, for example, each player may include computer executable program code, which when executed by the player's computer system (e.g., 202, FIG. 2) can cause the player to perform processing in accordance with FIG. 5.

At step 502, each player in the committee receives a transaction request from a client. Referring to FIG. 6, for example, the client sends transaction Tx. Each player p₁, p₂, p₃ receives its respective copy Tx₁, Tx₂, Tx₃ of the transaction. This can be deemed the beginning of an epoch. In Round 1 of communications, all players in the committee will broadcast their respective transactions to all players.

Round 1

At step 504, each player broadcasts its copy of the transaction received from the client to all the other players. In some embodiments, each player may accumulate several transactions from the same client or different clients, and begin the epoch with a block of transactions. For explanation purposes, the discussion will refer to transaction Tx with the understanding that Tx can mean either a single transaction or a block of transactions. Referring to FIG. 6, player p₁ sends its copy Tx₁ of the transaction to players p₂, p₃. Likewise, player p₂ sends its copy Tx₂ of the transaction to players p₁, p₃, and the same for player p₃.

At step 506, each player receives from every other player a copy of that other player's transaction. Referring to FIG. 6, player p₁ receives a copy Tx₂, Tx₃ respectively from players p₂, p₃. Player p₂ receives a copy Tx₁, Tx₃ respectively from players p₁, p₃, and likewise player p₃ receives a copy Tx₁, Tx₂ respectively from players p₁, p₂. This marks the end of Round 1 of communications. In Round 2, all players will validate that fairness exists.

Round 2

At step 508, each player produces a signed hash of the set of transactions Tx₁-Tx₃ it received at the end of Round 1. Referring to FIG. 6, each player p₁, p₂, p₃ inserts its respective copy Tx₁, Tx₂, Tx₃ of the transaction into the set of transactions it received at the end of Round 1. Thus, player p₁ adds Tx₁ to its set of received transactions Tx₂, Tx₃. Player p₂ adds Tx₂ to its set of received transactions Tx₁, Tx₃, and player p₃ adds Tx₃ to its received set of transactions Tx₁, Tx₂. At this point, each player should have the same set of transactions Tx₁-Tx₃.

Each player can order its set of transactions Tx₁-Tx₃ according to a deterministic rule (e.g., a sorting algorithm). Each player orders its set of transactions Tx₁-Tx₃ using the same rule, to generate an ordered set of transactions 602. Each player can hash its ordered set of transactions 602 to generate a hash value according a suitable hash function (e.g., SHA1). Each player uses the same hash function to generate a hash value. FIG. 6 shows each player p₁, p₂, p₃ having its respective hash value H1, H2, H3. At this point, each player should have the same hash value. Each player p₁, p₂, p₃ can sign its respective hash value H1, H2, H3 using its respective private key to generate a respective signed hash (echo) 604 a, 604 b, 604 c.

At step 510, each player p₁, p₂, p₃ broadcasts its respective signed hash 604 a, 604 b, 604 c to all the other players. Referring to FIG. 6, player p₁ sends its signed hash 604 a to players p₂, p₃, and likewise for player p₂ and player p₃.

At step 512, each player receives from every other player the signed hash of that other player. FIG. 6 shows that player p₁ receives signed hashes 604 b, 604 c, player p₂ receives signed hashes 604 a, 604 c, and player p₃ receives signed hashes 604 a, 604 b.

At step 514, each player validates that all players signed the same hash of transactions. Each player can decrypt the received signed hashes, using the respective public keys of the other player and compare the recovered hash values. Referring to FIG. 6, for example, player p₁ can decrypt the signed hash 604 b it received from player p₂ using the public key of player p₂ to recover p₂'s hash value H2, which can then be compared against p₁'s hash value H1. Player p₁ can repeat the process for the signed hash 604 c it received from player p₃, comparing p₃'s recovered hash value H3 against p₁'s hash value H1. Players p₂ and p₃ can perform the same processing on their respective received signed hashes 604 a, 604 c and 604 a, 604 b. This marks the end of Round 2 of communication.

At the end of Round 2, if a player validates that all other players signed the same hash of transactions, that player can commit its transaction received from the client. Note that we achieve fairness by forcing all players to wait for all other players and verifying the signed hashes (step 514). If a player omits another player's transaction during Round 2, the omission will be detected because the comparison of hash values at step 514 will not match. An epoch is committed only if all the players signed the same hash value, and since each player signs a hash that contains its own transaction we get the result that either all the players' transactions appear in the epoch and the epoch can be committed, or the epoch is not committed. In Round 3, all players can ensure recoverability in case reconfiguration is needed.

Round 3

At step 516, each player sends a “ready to commit” message to all other players. In some embodiments, the message can be signed by that player's private key.

At step 518, each player receives a signed “ready to commit” message from the other players. Each player can determine it has received the “ready to commit” message from the other players using the other players' respective public keys. When a player confirms that it has received the “ready to commit” message from all other players, it can appends its transaction to the shared ledger (e.g., 112 a, FIG. 1A), and send a positive response to the client. The client can deem that its transaction Tx (step 502) has been committed when it receives a positive response from all the players it sent the transaction Tx to. If the client does not receive a positive response from all the players, it can conclude that the transaction Tx has not been committed and can take appropriate action.

FIG. 7 illustrates the processing of Rounds 1, 2, and 3 in terms of pseudocode. For clarity, signature manipulation is not shown, although it is understood that all the messages are signed and verified. The pseudocode assumes the QoS allocation is equal for all players.

A Note about Read( ) Operations

Since all players make progress together, they all have up-to-date local ledgers. Therefore, a read(l) operation simply returns the last l committed transaction in the local ledger, where for every returned epoch k sequence of transactions st, it must attach a proof that st can be committed. We need the attached proof in order to make sure byzantine players do not lie about committed transactions. Such proof is either (1) a newConfig message from the master that includes st (more details below), or (2) f+1 epoch k Round 3 messages, each of which contains a hash of st.

Supporting Quality of Service

To support non-uniform quality of service, we might not want all players to participate in every epoch. In some embodiments, for example, the master service 106 can deterministically (without communication) determine which players should participate in every epoch according to the latest configuration of vector R. For example, for player p₁, p₂, p₃ receives and vector R=

½, ¼, ¼

, p₁ should participate in every epoch while p₂ and p₃ participate in every other epoch. Before the start of each epoch, the master service 106 can inform each player the list of players who will participate in that epoch.

Data Vs. Metadata Optimization

The first round (Round 1) of sequencing the protocol includes exchanging transactions (data), the second round (Round 2) include exchanging hashes of the transactions (metadata), and the last round (Round 3) includes exchanging commit messages (metadata). Communication in the first round is much more expensive than in the last two rounds. In order to increase throughput, data can be decoupled from metadata. In some embodiments, for example, we can perform the first rounds of each epoch out-of-order: we do not wait for epoch number k to commit before starting the first round of epoch k+1. Instead, we broadcast transactions in every epoch (i.e., execute the first round) as soon as possible. However, in order to be able to validate the transactions, we perform rounds 2 and 3 sequentially: we start round 2 of epoch k+1 only after epoch k commits. In other words, we can divide our communication into a data path and a metadata path, where the data path is out-of-order and the metadata path is responsible to order the data.

Recovery

Referring to the pseudocode in FIG. 8, the master service 106 monitors the system 100 by periodically asking the players for their status (e.g., the last committed epoch and received messages in the current epoch), and whenever it observes that some player p_(i) waits longer than 2Δ to deliver a message that another player p_(j) should have broadcasted, it invokes detect( ) to learn who is misbehaving and should be punished. See for example, lines 3-8 in FIG. 8.

Then, after the master service 106 decides to remove or punish a player, it starts a recovery mode. First, it stops the current configuration and learns its closing state by sending a reconfig message to all players. When a player receives a reconfig message the master service 106: (1) stops sending messages in the current configuration, (2) sends its local status to the master service 106, (3) and waits for a newConfig message from the master service 106. See for example, lines 9-18 in FIG. 8.

Since we use PKI, proving active deviations is readily accomplished, and every time a player gets a proof of active deviation, it sends it to the master. One example appears in the pseudocode in FIG. 7 at lines 16-17: a player gets two different hashes (corresponding to different sequences of transactions) in the second round, in which case, to ensure correctness, it cannot move on to round three. Instead, it complains to the master and waits for reconfiguration. When the master receives both hashes it checks which of the players signed two different transactions in the first round and issues a reconfiguration to remove this player. Other active deviations, e.g., incorrect messages formats, are handled in a similar way; for simplicity, however, we omit this from the pseudocode.

State Transfer

Note that while a byzantine player cannot make the master service 106 believe that an uncommitted epoch was committed (a committed epoch must be signed by all players participating in that epoch), it can omit the last committed epoch when asked (by the master service 106) about its local state. Such behavior can potentially lead to safety violation: suppose that some byzantine player P does not broadcast its last message in the last communication round in an epoch number k, but deliver messages from all other players. In this case, P has a proof that epoch number k is committed, meaning that a following read operation may return the transactions in it. However, no other player holds a proof that epoch k is committed, and the master service 106 can potentially miss this committed epoch if P report that epoch k was never committed. In this case, the new configuration will start from epoch number k−1 and commit different transactions in epoch number k, which will lead to safety violation when a read operation will be performed.

The third round of the epoch is used to overcome this potential problem. If the master service 106 observes that some player receives all messages in the second round of epoch k (FIG. 8, line 14), it can conclude that some non-byzantine player may have committed this epoch. Therefore, in this case, the master can include epoch k in the closing state. Since the private keys of byzantine players are unavailable to the master, it signs the epoch with its own private key, and sends it to all players in the new configuration (committee) as the opening state. A player that sees an epoch with the master's signature refers to it as if it is signed by all players. Recall that the master is a trusted entity, which can emulated by a BFT protocol.

Protocol Analysis

To prove that the disclosed protocol is correct, we need to show that (1) it is safe and fair in case all the players follow the protocol and there are at most f<n/2 byzantine players, and (2) following the protocol is a dominating strategy for every player. We refer to a player that follows the protocol as follower.

Safety. First, we show that the there are always at lest f+1 followers on the committee. Initially, the committee consists of n≥2f+1 players, that is, at least f+1 members follow the protocol. Members that follow the protocol in the case when there are at least f+1 such members, we get that the master service 106 never removes a player that follows the protocol, and thus there are always at least f+1 followers on the committee.

Now we show that if one player commits a sequence of transactions in epoch k, no other committee member commits different sequence of transactions in epoch k. Note that in order to commit a sequence of transactions st, players must have proof that st is allowed to be committed. One option for such proof is to have a

new Config; * ; k; h(st)

message from the master service 106, and anther option is to have both Round 3 messages from f+1 committee members, each of which contains a hash of st.

First note that all two members that commit epoch k after receiving newConfig from the master service 106, commit the same sequence of transactions. Second, since all followers send the same hash of transactions to all committee members in Round 2, all followers that send Round 3 messages include the same hash therein. And since players that commit with the second option must have in the proof at least one message from a follower, all players that commit with the second option commit the same sequence.

We are left to show that members that commit with the first option and members that commit with the second one commit the same sequence of transactions. Let p_(j) be a committee member that commits a sequence of transactions st with the second option. Since p_(j) receives f+1 messages in Round 3, then it received a Round 3 message from a member p_(f) that follows the protocol. Moreover, p_(f) sent a Round 3 message to p_(j) before it received a reconfig message from the master service 106. In addition, since p_(f) sent a Round 3 message, then it received Round 2 messages, that contains a hash of st, from all committee members before it received reconfig from the master service 106. Therefore, p_(f) includes all the Round 2 messages that it received in the status reply to the master service 106, and thus the master service 106 includes st in its closing state, and sends a

newConfig; * ; k; h(st)

message to the new committee. Hence, all members that commit with the first option commit st as well.

Fairness.

We need to show that every committed epoch contains transactions of all committee members. First, note that the hash of transactions each player sends in Round 2 contains its own transaction. Second, a player commits a sequence of transactions st only if some player receives the hash of st from all committee members in the second round. Therefore, a player commits a sequence of transactions only if it contains transactions of all committee members.

These and other variations, modifications, additions, and improvements may fall within the scope of the appended claims(s). As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.

The above description illustrates various embodiments of the present disclosure along with examples of how aspects of the present disclosure may be implemented. The above examples and embodiments should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of the present disclosure as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents may be employed without departing from the scope of the disclosure as defined by the claims. 

1. A method for building agreement among a plurality of servers, the method comprising: receiving by a first server among the plurality of servers a client transaction that is broadcast to all the servers by a client; broadcasting by the first server its received client transaction to all other servers; receiving by the first server copies of the client transaction from all other servers; producing by the first server an echo that is representative of both its received client transaction and copies of the client transaction from all other servers; broadcasting by the first server its echo to all other servers; receiving by the first server echoes from all other servers; and committing by the first server its received client transaction only if its echo matches each echo received from each sever among the plurality of servers.
 2. The method of claim 1, wherein committing the received client transaction of the first server includes the first server appending its received client transaction as an entry in a ledger that is shared among the plurality of servers, wherein each server among the plurality of servers has an equal share of entries in the ledger.
 3. The method of claim 2, wherein each of the plurality of servers maintains its respective copy of the shared ledger.
 4. The method of claim 2, wherein the ledger is a globally accessed ledger.
 5. The method of claim 1, wherein producing the echo by the first server includes the first server generating a hash value using both its received client transaction and copies of the client transaction received from all other servers.
 6. The method of claim 5, wherein producing the echo by the first server further includes the first server digitally signing its hash value.
 7. The method of claim 1, further comprising using relays designated from among the plurality of servers to broadcast information from one server to all other servers.
 8. The method of claim 1, wherein the plurality of servers comprises a most 2f+1 servers.
 9. The method of claim 1, being performed by each server among the plurality of servers.
 10. A non-transitory computer-readable storage medium having stored thereon computer executable instructions, which when executed by a processing unit in a first server among a plurality of servers, cause the first server to: receive a client transaction that is broadcast to all the servers by a client; broadcast the received client transaction of the first server to all other servers; receive copies of the client transaction from all other servers; produce an echo that is representative of both the received client transaction of the first server and copies of the client transaction from all other servers; broadcast the echo of the first server to all other servers; receive echoes from all other servers; and commit the received client transaction of the first server only if the echo of the first server matches each echo received from each sever among the plurality of servers.
 11. The non-transitory computer-readable storage medium of claim 10, wherein committing the received client transaction of the first server includes the first server appending its received client transaction as an entry in a ledger that is shared among the plurality of servers, wherein each server among the plurality of servers has an equal share of entries in the ledger.
 12. The non-transitory computer-readable storage medium of claim 10, wherein producing the echo includes the first server generating a hash value using both its received client transaction and copies of the client transaction received from all other servers.
 13. The non-transitory computer-readable storage medium of claim 12, wherein producing the echo further includes the first server digitally signing its hash value.
 14. The non-transitory computer-readable storage medium of claim 10, further comprising using relays designated from among the plurality of servers to broadcast information from one server to all other servers.
 15. A system comprising a plurality of servers, each server comprising: one or more computer processors, and a computer-readable storage medium comprising instructions for controlling the one or more computer processors to be operable to: receive by said each server a client transaction that is broadcast to all the servers by a client; broadcast by said each server its received client transaction to all other servers; receive by said each server copies of the client transaction from all other servers; produce by said each server an echo that is representative of both its received client transaction and copies of the client transaction from all other servers; broadcast by said each server its echo to all other servers; receive by said each server echoes from all other servers; and commit by said each server its received client transaction only if its echo matches each echo received from each sever among the plurality of servers.
 16. The system of claim 15, wherein committing the received client transaction of said each server includes said each server appending its received client transaction as an entry in a ledger that is shared among the plurality of servers, wherein the plurality of servers, each, has an equal share of entries in the ledger.
 17. The system of claim 15, wherein producing the echo by said each server includes said each server: generating a hash value using both its received client transaction and copies of the client transaction received from all other servers; and digitally signing the hash value.
 18. The system of claim 15, further comprising using relays designated from among the plurality of servers to broadcast information from one server to all other servers.
 19. The system of claim 15, wherein the plurality of servers comprises a most 2f+1 servers.
 20. The system of claim 15, being performed by each server among the plurality of servers. 